Tuning support vector machines for biomedical named entity recognition
نویسندگان
چکیده
We explore the use of Support Vector Machines (SVMs) for biomedical named entity recognition. To make the SVM training with the available largest corpus – the GENIA corpus – tractable, we propose to split the non-entity class into sub-classes, using part-of-speech information. In addition, we explore new features such as word cache and the states of an HMM trained by unsupervised learning. Experiments on the GENIA corpus show that our class splitting technique not only enables the training with the GENIA corpus but also improves the accuracy. The proposed new features also contribute to improve the accuracy. We compare our SVMbased recognition system with a system using Maximum Entropy tagging method.
منابع مشابه
Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines
This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications doma...
متن کاملBiomedical Named Entity Recognition Using Support Vector Machines: Performance vs. Scalability Issues
This paper examines the performance and scalability of Named Entity Recognition (NER) using multi-class Support Vector Machines (SVM) and high-dimensional features. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. We use a simple machine learning approach that eliminates prior language knowledge...
متن کاملScalable biomedical Named Entity Recognition: investigation of a database-supported SVM approach
This paper explores scalability issues associated with the Named Entity Recognition problem in the biomedical publications domain using Support Vector Machines. The performance results using existing binary and multi-class SVMs with increasing training data are compared to results obtained using our new implementations. Our approach eliminates prior language or domain-specific knowledge and ach...
متن کاملAnnotating Multiple Types of Biomedical Entities: A Single Word Classification Approach
Named entity recognition is a fundamental task in biomedical data mining. Multiple -class annotation is more challenging than single class annotation. In this paper, we took a single word classification approach to dealing with the multiple -class annotation problem using Support Vector Machines (SVMs). Word attributes, results of existing gene/protein name taggers, context, and other informati...
متن کاملNamed Entity Recognition using Maximum Entropy Models on Biologists’ Literature
According to the explosion of online biomedical texts, it becomes more difficult to get exact information manually. The named entity recognition is the very first step for further text mining tasks like information extraction, knowledge discovery and others. In this paper, we present our statistical named entity recognition method. Until now, there were some approaches using different statistic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002